Bagging and Boosting statistical machine translation systems
نویسندگان
چکیده
a r t i c l e i n f o a b s t r a c t In this article we address the issue of generating diversified translation systems from a single Statistical Machine Translation (SMT) engine for system combination. Unlike traditional approaches, we do not resort to multiple structurally different SMT systems, but instead directly learn a strong SMT system from a single translation engine in a principled way. Our approach is based on Bagging and Boosting which are two instances of the general framework of ensemble learning. The basic idea is that we first generate an ensemble of weak translation systems using a base learning algorithm, and then learn a strong translation system from the ensemble. One of the advantages of our approach is that it can work with any of current SMT systems and make them stronger almost " for free ". Beyond this, most system combination methods are directly applicable to the proposed framework for generating the final translation system from the ensemble of weak systems. We evaluate our approach on Chinese–English translation in three state-of-the-art SMT systems, including a phrase-based system, a hierarchical phrase-based system and a syntax-based system. Experimental results on the NIST MT evaluation corpora show that our approach leads to significant improvements in translation accuracy over the baselines. More interestingly, it is observed that our approach is able to improve the existing system combination systems. The biggest improvements are obtained by generating weak systems using Bagging/Boosting, and learning the strong system using a state-of-the-art system combination method.
منابع مشابه
Combining Bagging and Additive Regression
Bagging and boosting are among the most popular resampling ensemble methods that generate and combine a diversity of regression models using the same learning algorithm as base-learner. Boosting algorithms are considered stronger than bagging on noisefree data. However, there are strong empirical indications that bagging is much more robust than boosting in noisy settings. For this reason, in t...
متن کاملCombining Bagging and Boosting
Bagging and boosting are among the most popular resampling ensemble methods that generate and combine a diversity of classifiers using the same learning algorithm for the base-classifiers. Boosting algorithms are considered stronger than bagging on noisefree data. However, there are strong empirical indications that bagging is much more robust than boosting in noisy settings. For this reason, i...
متن کاملWhen Efficient Model Averaging Out-Performs Boosting and Bagging
Bayesian model averaging also known as the Bayes optimal classifier (BOC) is an ensemble technique used extensively in the statistics literature. However, compared to other ensemble techniques such as bagging and boosting, BOC is less known and rarely used in data mining. This is partly due to model averaging being perceived as being inefficient and because bagging and boosting consistently out...
متن کاملBoosting-Based System Combination for Machine Translation
In this paper, we present a simple and effective method to address the issue of how to generate diversified translation systems from a single Statistical Machine Translation (SMT) engine for system combination. Our method is based on the framework of boosting. First, a sequence of weak translation systems is generated from a baseline system in an iterative manner. Then, a strong translation sys...
متن کاملEnsemble Methods for Environmental Data Modelling with Support Vector Regression
This paper investigates the use of ensemble of predictors in order to improve the performance of spatial prediction methods. Support vector regression (SVR), a popular method from the field of statistical machine learning, is used. Several instances of SVR are combined using different data sampling schemes (bagging and boosting). Bagging shows good performance, and proves to be more computation...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Artif. Intell.
دوره 195 شماره
صفحات -
تاریخ انتشار 2013